Enterprise Database Systems
Big Data Development with Apache Spark
Apache Spark SQL
Introduction to Apache Spark
Spark Monitoring and Tuning
Spark Security
Structured Streaming

Apache Spark SQL

Course Number:
df_apsk_a02_it_enus
Lesson Objectives

Apache Spark SQL

  • start the course
  • describe Apache Spark SQL
  • create a SparkSession
  • create DataFrames with Spark SQL
  • use aggregations with the built-in DataFrames functions
  • run SQL queries programmatically
  • create a global temporary view
  • create Datasets with Spark SQL
  • use JSON Datasets with Spark SQL
  • use Load/Save functions
  • manually specify a data source
  • run SQL directly on files
  • use SaveMode to handle save operations
  • write parquet files with Spark SQL
  • use Spark SQL to save a DataFrame as a persistent table
  • use partitioning when saving persistent tables
  • use Spark SQL to create Datasets and DataFrames

Overview/Description
In this course, you will be introduced to Apache Spark SQL, Datasets, and DataFrames.

Target Audience
Programmers and Developers wishing to perform big data development using Apache Spark 2.2.

Introduction to Apache Spark

Course Number:
df_apsk_a01_it_enus
Lesson Objectives

Introduction to Apache Spark

  • start the course
  • describe Apache Spark 2.2 and the main components of a Spark application
  • download and install Apache Spark 2.2
  • download and install Apache Spark 2.2 on Mac OS
  • build Apache Spark using Apache Maven
  • use Spark shell for analyzing data interactively
  • link an application to Spark
  • identify the three locations Spark provides to configure the system
  • create a SparkContext to initialize Apache Spark
  • describe how Spark runs on clusters and the three supported cluster managers
  • use Apache Spark shell

Overview/Description
In this course, you will be introduced to Apache Spark. You'll learn how to download and install Apache Spark, and you'll also learn how to build, configure, and initialize Spark.

Target Audience
Programmers and Developers wishing to perform big data development using Apache Spark 2.2

Spark Monitoring and Tuning

Course Number:
df_apsk_a04_it_enus
Lesson Objectives

Spark Monitoring and Tuning

  • start the course
  • access the web user interface
  • use the Spark environment configuration parameters
  • use JSON to query monitoring tools for Spark
  • set JVM fractional memory amounts for Spark
  • modify speculation controls for Spark tasks
  • describe data serialization and the role it plays in the performance of Spark applications
  • describe memory management and consumption
  • determine executor memory allocation
  • describe garbage collection tuning
  • set the level of parallelism
  • use the broadcast functionality
  • use query execution plan explainer
  • implement data compression on parquet storage
  • monitor Spark applications

Overview/Description
In this course, you will learn about various ways to monitor Spark applications such as web UIs, metrics, and other monitoring tools. You will also learn about memory tuning.

Target Audience
Programmers and Developers wishing to perform big data development using Apache Spark 2.2

Spark Security

Course Number:
df_apsk_a05_it_enus
Lesson Objectives

Spark Security

  • start the course
  • secure the Spark UI by limiting access using a firewall
  • set permissions on the directory where the event logs are stored
  • configure the SSL settings
  • configure a shared secret for Spark authentication
  • configure spark.authenticate
  • enable SASL encryption for a Spark application
  • configure the primary ports Spark uses for communication
  • configure Spark security

Overview/Description
This course introduces Spark security. In this course, you will learn about securing Spark UI, event logs, and configuring SSL settings. You will also learn about YARN deployments, SASL encryption, and network security.

Target Audience
Programmers and developers wishing to perform big data development using Apache Spark 2.2

Structured Streaming

Course Number:
df_apsk_a03_it_enus
Lesson Objectives

Structured Streaming

  • start the course
  • describe Structured Streaming
  • read stream input using readStream
  • write stream data using writeStream
  • apply window operations on event time
  • describe continuous applications in terms of structured streaming
  • implement deduplication with and without watermarking
  • store stream output to a directory using a file sink
  • use streaming query objects
  • manage streaming queries
  • enable checkpointing in structured streaming
  • use structured streaming to implement a word count on a text stream
  • describe the basics of Spark Streaming

Overview/Description
In this course, you will learn about the concepts of Structured Steaming such as Windowing, DataFrame, and SQL Operations. You will also learn about File Sinks, Deduplication, and Checkpointing.

Target Audience
Programmers and Developers wishing to perform big data development using Apache Spark 2.2

Close Chat Live